Mining Named Entity Translation from Non Parallel Corpora

نویسندگان

  • Rahma Sellami
  • Fatiha Sadat
  • Lamia Hadrich Belguith
چکیده

In this paper, we address the problem of mining named entity translation such as names of persons, organizations, and locations, from non parallel corpora. First, our study concentrates of different forms of named entity translation. Then, we introduce a new framework to extract all named entity translation types from a non parallel corpus. The proposed framework combines surface and linguistic-based approaches. It is language independent and do not rely on any external parallel resources such as bilingual lexicons or parallel corpora. Evaluations show that our approach for mining named entity translations from a non parallel corpus is highly effective and consistently improves the translation quality of Arabic to French machine translation system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrapping Entity Translation on Weakly Comparable Corpora

This paper studies the problem of mining named entity translations from comparable corpora with some “asymmetry”. Unlike the previous approaches relying on the “symmetry” found in parallel corpora, the proposed method is tolerant to asymmetry often found in comparable corpora, by distinguishing different semantics of relations of entity pairs to selectively propagate seed entity translations on...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora

The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bior multi-lingual text resources) which are much more widely available than parallel translation data. Our presented toolkit deals with parallel content extract...

متن کامل

Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora

Parallel Named Entity pairs are important resources in several NLP tasks, such as, CLIR and MT systems. Further, such pairs may also be used for training transliteration systems, if they are transliterations of each other. In this paper, we profile the performance of a mining methodology in mining parallel named entity transliteration pairs in English and an Indian language, Tamil, leveraging l...

متن کامل

Using Word Embeddings to Translate Named Entities

In this paper we investigate the usefulness of neural word embeddings in the process of translating Named Entities (NEs) from a resource-rich language to a language low on resources relevant to the task at hand, introducing a novel, yet simple way of obtaining bilingual word vectors. Inspired by observations in (Mikolov et al., 2013b), which show that training their word vector model on compara...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014